Improvement in corpus-based generation of F0 contours using generation process model for emotional speech synthesis
نویسنده
چکیده
In our fully automatic corpus-based method of generating fundamental frequency (F0) contours for emotional speech synthesis, an improvement was realized related to the process of corpus preparation. The method assumes the generation process model and predicts its command parameters using binary regression trees with inputs of linguistic information of the sentence to be synthesized. Because of the model constraint, a certain quality is still kept in synthesized speech even if the prediction is done incorrectly. The speech corpus includes three types of emotional speech (anger, joy, sadness) and calm speech uttered by a female narrator. The command parameters necessary for the training (and testing) of the method were automatically extracted from speech using a program developed by the authors. Since the accuracy of the extraction largely affects the prediction performance, a constraint is newly applied on the position of phrase commands during the extraction. Also, since performance of phrase command prediction dominates the overall accuracy of generated F0 contours, the method was modified to predict phrase commands first. The mismatches between the predicted and target contours for angry speech were similar to those for calm speech. Synthesis of emotional speech was conducted with text inputs. The segmental features were handled by the HMM synthesis method and the phoneme durations are predicted in a similar corpus-based method. Perceptual experiment was conducted using the synthesized speech, and the result indicated that the anger could be well conveyed by the developed method. The result came worse for joy and sadness.
منابع مشابه
Corpus-based Synthesis of F0 Conto Using the Generation P
A corpus-based generation of fundamental frequency (F0) contours was realized for emotional speech synthesis. The method, originally developed for read speech, is to predict command values of the F0 contour generation process model with the input of linguistic information of the sentence to be synthesized. Since the generated F0 contour is under the model constraint, a certain quality is still ...
متن کاملCorpus-based Generation of Fundamental Fr Process Model and Considerin
We formerly conducted emotional speech synthesis using our corpus-based method of generating fundamental frequency (F0) contours from text. The method predicts command values of F0 contour generation process model instead of directly predicting F0 value of each time frame. A better control of F0 contours was realized by taking the emotional level of each bunsetsu into account: adding informatio...
متن کاملImproved Automatic Extraction of Generation Process Model Commands and Its use for Generating Fundamental Frequency Contours for Training HMM-based Speech Synthesis
Generation process model of fundamental frequency (F0) contours can well represent F0 movements of speech keeping a clear relation with linguistic information of utterances. Therefore, by using the model, improvement of HMM-based speech synthesis is expected. One of major problems preventing the use of the model is that the performance of automatic extraction of the model parameters from observ...
متن کاملEmotional Speech Synthesis with Corpus-Based Generation of F0 Contours Using Generation Process Model
A method was developed for the corpus-based synthesis of emotional speech. Fundamental frequency (F0) contours were synthesized by predicting command values of the generation process model using binary regression trees with the input of linguistic information of the sentence to be synthesized. Because of the model constraint, a certain quality is still kept in synthesized speech even if the pre...
متن کاملCorpus-based synthesis of fundamental frequency contours with various speaking styles from text using F0 contour generation process model
A corpus-based method of generating fundamental frequency (F0) contours of various speaking styles from text was developed. Instead of directly predicting F0 values, the method predicts command values of the F0 contour generation process model. Because of the model constraint, the resulting F0 contour keeps certain naturalness even when the prediction is done incorrectly. The method includes a ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2004